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1  Introduction 

In  this  report  we  begin  by  restating  the  motivation  for  our  work,  and  review  the  project  objectives.  We  present  our 
results  and  follow  each  research  thrust  with  potential  future  areas  of  work.  We  conclude  with  a  list  of  publications 
supported  by  the  grant,  and  a  list  of  project  personnel. 

1.1  Review  of  motivation 

Over  the  past  several  years,  sensors  and  signal  processing  algorithms  and  hardware  have  been  under  increasing 
pressure  to  accommodate: 

•  ever  larger  and  higher-dimensional  data  sets,  including  samples  of  wideband  radio  frequency  (RF)  signals,  high- 
resolution  images  and  video,  volumetric  data,  three-dimensional  (3-D)  video,  4-D+  lightfields,  and  beyond; 

•  ever  faster  capture,  sampling,  and  processing  rates; 

•  ever  lower  power  consumption;  in  order  to  permit  remote,  battery  operation  for  long  periods; 

•  networked  sensing  schemes  for  spatially  distributed  sources  and  phenomena; 

•  communication  over  ever  more  difficult  channels;  and 

•  radically  new  sensing  modalities. 

Fortunately,  over  the  same  time  period,  there  has  been  an  enormous  increase  in  computational  power  and  data 
storage  thanks  to  Moore’s  Law,  which  provides  a  new  angle  to  tackle  these  challenges. 

We  are  currently  on  the  verge  of  moving  from  a  digital  signal  processing  (DSP)  paradigm,  where  analog  signals  are 
sampled  periodically  to  create  their  digital  counterparts  for  processing,  to  a  computational  signal  processing  (CSP) 
paradigm,  where  analog  signals  will  be  converted  (often  directly)  to  any  of  a  number  of  intermediate  representations 
for  processing  using  computational  and  optimization  techniques.  At  the  foundation  of  CSP  he  new  uncertainty 
principles  that  generalize  Heisenberg’s  between  the  time  and  frequency  domains,  the  concepts  of  compressibility 
and  sparsity,  and  the  new  theory  of  compressive  sensing  (CS). 

The  enabling  idea  is  that  natural  signals  and  other  data  often  contain  some  type  of  structure  that  makes  them 
compressible.  A  compressible  signal  of  length  N  can  be  well  approximated  using  K  real  numbers,  with  K  N . 
Many  audio  signals,  natural  images,  and  manmade  signals,  for  example,  are  compressed  by  a  factor  of  10  or 
more  when  expressed  in  terms  of  their  largest  Fourier  or  wavelet  coefficients.  The  usual  approach  to  acquiring  a 
compressible  signal  is  to  take  measurements  in  the  Dirac  basis  and  then  use  a  nonlinear  algorithm,  such  as  a  speech, 
MP3,  JPEG,  or  MPEG  coder,  to  obtain  a  more  efficient  approximation. 

But  this  approach  is  not  practicable  if  the  signal  is  presented  at  a  high  rate  (as  in  a  radar  system)  or  if  the 
measurement  device  has  limited  computational  resources  (as  in  a  sensor  network).  Fortunately,  over  the  past  two 
years  a  new  theory  of  Compressive  Sensing  (CS)  has  emerged,  in  which  an  incoherent  linear  projection  is  used  to 
acquire  an  efficient  representation  of  a  compressible  signal  directly  using  just  M  «  K  <C  N  measurements  [1-6]. 
Interestingly,  random  projections  play  a  major  role.  The  signal  is  then  reconstructed  by  solving  an  inverse  problem 
either  through  a  linear  program  or  a  greedy  pursuit. 

CS  offers  a  fresh  approach  to  framing  and  solving  a  number  of  timely  and  challenging  problems  in  signal  and 
image  processing  and  imaging.  In  this  project,  we  have  explored  its  potential  as  a  dimensionality  reduction  tool, 
as  a  distributed  source  coding  system,  and  as  a  sensing  framework  for  radar  systems. 
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Figure  1:  Pseudo-random  demodulation  scheme  for  AIC. 


1.2  Review  of  project  objectives 

This  project  aimed  at  exploring  the  foundations  and  applications  of  CS  in  signal  and  image  processing  and  imaging 
problems.  Specifically,  we  investigated: 

1.  Information  scalability  of  CS  The  CS  literature  has  focused  almost  exclusively  on  problems  in  signal 
reconstruction,  approximation,  and  estimation  in  noise.  However,  random  projections  have  a  long  history  as  a 
dimensionality  reduction  tool  for  more  general  statistical  modeling  and  classification  problems  [7] .  We  explored 
the  information  scalability  of  CS  to  a  range  of  statistical  inference  tasks.  In  particular,  we  investigated  how 
CS  principles  can  achieve  direct,  high-accuracy  target  detection/recognition  from  CS  measurements  without 
reconstruction  the  signal/image  involved  and  using  fewer  measurements.  We  also  investigated  “analog-to- 
information  conversion,”  illuminating  the  benefits  of  applying  CS  to  high-rate  analog-to-digital  conversion 
problems. 

2.  Distributed  sensing  and  encoding  using  CS  The  CS  literature  has  focused  almost  exclusively  on  prob¬ 
lems  involving  single  sensors,  signals,  or  images.  However,  many  important  applications  involve  distributed 
networks  or  arrays  of  sensors.  We  developed  theory  and  algorithms  for  distributed  compressive  sensing  (DCS) 
that  enable  new  signal  acquisition  and  coding  algorithms  for  multi-signal  ensembles  and  sensor  networks  that 
exploit  both  intra-  and  inter-signal  correlation  structures.  Specifically,  we  used  graphical  models  to  derive 
explicit  performance  bounds. 

3.  CS-based  radar  signal  processing  and  imaging  We  investigated  how  CS  concepts  can  enable  new  and 
simplified  kinds  of  radar  imaging  hardware  and  algorithms.  We  formalized  our  approach  to  1-D  CS  radar  and 
expanded  our  existing  work  to  a  2-D  SAR  CS  imaging  problem.  We  anticipate  that  our  techniques  will  be 
particularly  appropriate  for  inexpensive  networks/arrays  of  radar  receivers. 

2  Information  scalability  of  CS 

2.1  Summary  of  results 

Our  work  on  information  scalability  was  centered  on  two  thrusts.  The  first  was  the  theory  and  application  of  analog- 
to-information  conversion.  We  applied  CS  principles  to  perform  accurate  analog-to-digital  conversion  on  high  rate 
signals,  using  a  sub-Nyquist  sampling  rate.  We  developed  new  theory,  algorithms,  performance  bounds,  and  a 
prototype  implementation  for  an  analog-to-information  converter  based  on  random  demodulation.  The  architecture 
is  particularly  apropos  for  wideband  signals  that  are  sparse  in  the  time-frequency  plane.  Our  end-to-end  simulations 
of  a  complete  transistor-level  implementation  proved  the  concept  under  the  effect  of  circuit  nonidealities  [8] . 

The  second  thrust  was  applying  CS  principles  to  detection  and  classification  problems.  Our  approach  was  based 
on  the  generalized  likelihood  ratio  test;  in  the  case  of  image  classification,  it  exploits  the  fact  that  a  set  of  images 
of  a  fixed  scene  under  varying  articulation  parameters  forms  a  low-dimensional,  nonlinear  manifold.  Exploiting 
recent  results  showing  that  random  projections  stably  embed  a  smooth  manifold  in  a  lower- dimensional  space,  we 
developed  the  multiscale  smashed  filter  as  a  compressive  analog  of  the  familiar  matched  filter  classifier.  In  a  practical 
target  classification  problem  using  a  single-pixel  camera  that  directly  acquires  compressive  image  projections,  we 
achieved  high  classification  rates  using  many  fewer  measurements  than  the  dimensionality  of  the  images. 

2 . 2  Analog-to-information  conversion 

2.2.1  Compressive  sensing  background 

Compressive  Sensing  (CS)  provides  a  framework  for  acquisition  of  an  N  x  1  discrete-time  signal  vector  x  =  \l/a 
that  is  compressible  in  some  sparsity  basis  or  frame  matrix  T'  (where  each  column  is  a  basis  or  frame  vector  ipi).  By 
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Figure  2:  Comparison  of  Spectrograms  obtained  from  full  and  CS  compressed  versions  of  a  frequency  hopping  signal.  The 
signal  is  a  single  side-band  AM  signal,  whose  carrier  frequency  changes  periodically  over  time,  (left)  Spectrogram  from 
original  signal,  (right)  Spectrogram  from  CS  reconstruction  with  measurement  rate  equal  to  25%  of  Nyquist  rate. 


compressible  we  mean  that  the  entries  of  a  =  [aq,  aq, . . . ,  aqv]?  when  sorted  from  largest  to  smallest,  decay  rapidly 
to  zero;  such  a  signal  is  well  approximated  using  a  iC-term  representation,  consisting  of  the  terms  of  a  with  the  K 
largest  magnitudes  while  setting  all  the  other  terms  to  zero.  Note  that,  by  definition,  signals  that  have  only  a  few 
nonzero  coefficients  are  compressible  as  well. 

The  CS  framework  [4,  6]  demonstrates  that  a  signal  that  is  compressible  in  one  basis  \l/  can  be  recovered  to 
a  quality  similar  to  that  of  a  K-teim  approximation  from  M  =  cK  nonadaptive  linear  projections  onto  a  second 
basis  <I>  that  is  incoherent  with  the  first,  with  c  a  small  overmeasuring  constant.  By  incoherent,  we  mean  that 
the  rows  <fij  of  the  matrix  <1>  cannot  sparsely  represent  the  elements  of  the  sparsity-inducing  basis  and  vice 
versa.  Thus,  rather  than  measuring  the  7V-point  signal  x  directly,  we  acquire  the  M  <C  N  linear  projections 
y  =  3>x  +  n  =  +  n,  where  n  represents  the  noise  inherent  to  the  measurement  process.  For  brevity,  we  define 

the  M  x  N  matrix  0  = 

Since  M  <  N ,  recovery  of  the  signal  x  from  the  measurements  y  is  ill-posed  in  general;  however,  the  additional 
assumption  of  signal  compressibility  in  the  basis  ^  makes  recovery  both  feasible  and  practical.  The  recovery  of  the 
set  of  transform  coefficients  a  can  be  achieved  through  optimization  [9]  by  searching  for  the  signal  with  the  smallest 
£i  norm  for  the  coefficient  vector  a  that  agrees  with  the  M  observed  measurements  in  y  within  the  margin  of  error 
given  by  the  magnitude  of  the  noise  e  >  || n|| 2 : 

a  =  argmin  ||a||i  such  that  ||y  —  ©a||2  <  e  (1) 

This  optimization  problem,  also  known  as  Basis  Pursuit  with  Denoising  (BPDN)  [10]  can  be  solved  with  tradi¬ 
tional  convex  programming  techniques  whose  computational  complexities  are  polynomial  in  N.  At  the  expense  of 
slightly  more  measurements,  iterative  greedy  algorithms  like  Orthogonal  Matching  Pursuit  (OMP)  [11]  can  also  be 
applied  to  the  recovery  problem. 

2.2.2  Real-time  CS 

Our  signal  acquisition  system  consists  of  three  main  components;  demodulation,  filtering,  and  uniform  sampling. 
As  seen  in  Figure  1,  the  signal  is  modulated  by  a  psuedo-random  maximal-length  PN  sequence  of  ±l’s.  We  call  this 
the  chipping  sequence  pc(t);  its  chipping  rate,  i.e.  the  rate  of  change  of  symbols,  must  be  faster  than  the  Nyquist 
rate  for  the  input  signal.  The  purpose  of  such  modulation  is  to  provide  randomness  necessary  for  successful  CS 
recovery.  The  modulation  is  followed  by  a  low-pass  filter  with  impulse  response  h(t).  Finally,  the  signal  is  sampled 
at  rate  A4  using  a  traditional  ADC.  This  system  can  be  formulated  as  a  CS  measurement  matrix  as  seen  in  [12]. 

2.2.3  Reconstruction  for  analog  time-frequency  sparse  signals 

We  consider  the  case  of  wideband  signals  that  are  time-frequency  sparse  in  the  sense  that  at  each  point  in  time  they 
are  well- approximated  by  a  few  local  sinusoids  of  constant  frequency.  As  a  practical  example,  consider  sampling 
a  frequency-hopping  communications  signal  that  consists  of  a  sequence  of  windowed  sinusoids  with  frequencies 
distributed  between  /1  and  /2  Hz.  The  bandwidth  of  this  signal  is  /2  —  fi  Hz,  which  dictates  sampling  above  the 
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Nyquist  rate  of  2(/2  —  fi)  Hz  to  avoid  aliasing.  We  are  interested  in  the  case  where  —  fi  is  very  large  and  the 
signal  is  compressible,  since  the  AIC  will  achieve  much  better  performance  than  an  ADC. 

It  is  well  known  that  signals  that  are  localized  in  the  time-frequency  domain  have  compact  transformations 
under  the  Gabor  transform,  which  is  defined  as 

x(' r,f)  =  {x(t),1pTj(t)), 

i.e.  the  coefficient  measures  the  inner  product  of  the  signal  with  the  Gabor  atoms 

=  git  -  T)e±j2*ft 

where  g  is  a  window  function  with  ||g||2  =  1  [13].  We  will  leverage  this  compact  nature  during  the  reconstruction 
of  the  signal  to  obtain  a  representation  directly  in  the  time- frequency  domain,  without  performing  reconstruction 
of  the  original  time  signal. 

The  conventional  tool  for  this  class  of  signals  is  a  spectrogram.  A  spectrogram  is  assembled  using  the  magnitude 
of  short-time  Fourier  transforms  (STFT)  that  performs  Fourier  analysis  of  shifted  windowed  versions  of  the  input 
signals  to  establish  frequency  content  at  local  time  neighborhoods.  The  STFT  is  written  as 

/oo 

x(t)g(t  —  lT)e~^27rrnt^ndt 

-oo 

for  l  =  1, . . . ,  n/r  and  m  =  1, . . . ,  n.  This  tool  provides  a  visual  representation  of  the  Fourier  spectrum  of  a  signal 
over  time.  The  spectrogram  can  be  thought  of  as  a  uniform  sampling  of  the  coefficients  of  the  signal  under  the 
Gabor  transform.  Thus,  by  utilizing  a  dictionary  matrix  consisting  of  a  sampling  of  the  Gabor  atoms,  the 
signal  x  can  be  represented  using  a  sparse  or  compressible  vector  a  under  the  dictionary  4>.  In  this  fashion, 
our  sparse  reconstruction  of  the  signal  will  be  obtained  directly  in  the  time  frequency  domain  -  we  observe  the 
spectrogram  directly  without  requiring  reconstruction  of  the  original  signal.  An  example  is  shown  in  Figure  2(a) 
where  the  spectrogram  of  a  single  sideband  AM  frequency  hopping  signal  is  displayed.  We  see  that  for  small  ranges 
of  time,  the  signal  is  well  identified  by  its  carrier  frequency,  but  when  we  consider  the  whole  signal  length  there 
are  many  carriers  to  isolate.  The  spectrogram  pictured  in  Figure  2(b)  shows  reconstruction  of  the  signal  from 
AIC  measurements  using  a  Gabor  dictionary  with  a  boxcar  window.  The  carriers  in  the  reconstruction  are  easily 
identified.  The  noise  appears  due  to  the  non-sparse  structure  of  the  input  signal;  however,  its  compressibility  allows 
us  to  recover  the  largest  components. 

As  a  bonus,  when  the  we  reconstruct  the  sparse  representation  oc  from  our  measurements  y,  the  values  in  a 
directly  correspond  to  the  coefficients  in  the  spectrogram.  This  is  apparent  from  the  formulation  of  the  Gabor 
atoms  and  the  STFT.  A  spectrogram  analysis  can  be  immediately  displayed  from  a  without  final  reconstruction  of 
the  signal’s  estimated  time  representation  x. 

2.2.4  Analog-to-information  system  performance 

In  this  section  we  wish  to  characterize  the  SNR  of  the  AIC  system  using  known  analysis  of  CS  performance.  We 
present  a  theorem  for  A-sparse  signals,  which  gives  insight  into  the  SNR  behavior  of  the  AIC  system.  The  following 
definition  is  used  in  the  theorem. 

Definition  1  A  matrix  <f>  of  size  M  x  N  holds  the  K -Restricted  Isometry  Property  (K-RIP)  with  constant  5k  if 
for  all  x  G  RN  with  ||x||0  =  K, 

(1  -  5  k ) 1 1 x 1 1 2  <  ||$x||2  <  (1  +  <?Ar)||x||2. 

Theorem  1  Let  x  be  an  K -sparse  signal ,  i.e.  ||x||o  =  K,  and  let  y  =  <Fx  represent  an  AIC  measurement  setup, 
where  we  label  reconstruction  from  the  measurements  y  as  x^  with  AIC  reconstruction  using  BPDN.  If  <f>  holds  the 
A -Restricted  Isometry  Property  (RIP)  with  constant  5k  and  if  5^k  +  ^5^k  <  2,  then  the  SNR  of  the  AIC  system 

obeys  the  lower  bound  SNRaic  =  201og  (||J«-x||2) 

>  SNRsystem  —  201og((l  +  5k)C\,k) 

where  SNRsystem  is  the  SNR  of  the  sampling  subsystem  and  C\^k  is  a  constant  depending  only  on  K . 

The  condition  on  the  RIP  constants  holds  for  random  Gaussian  matrices  when  the  number  of  rows  is  large  enough. 
The  theorem  is  proven  in  [14].  This  bound  on  the  performance  decay  will  depend  on  the  compressibility  of  the 
signal  and  the  class  of  matrix  <f>  applied.  As  an  example,  if  a  Gaussian  random  matrix  <F  is  used  with  a  large  enough 
row-to-column  ratio,  and  the  signal  has  a  sparsity  K  =  A/10,  the  loss  in  performance  is  approximately  23dB. 
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2.3  The  smashed  filter 


As  the  second  thrust  of  our  work  on  information  scalability,  we  formulated  a  classification  algorithm  that  uses 
compressive  multiscale  measurements  to  exploit  the  low-dimensional  manifold  structure  inherent  in  the  signal  classes 
used  in  target  recognition  applications.  We  learned  this  manifold  structure  from  training  data,  which  served  as  a 
sampling  of  points  from  the  manifolds.  Such  structure  allowed  us  both  to  reduce  the  dimensionality  of  the  training 
data  through  random  measurements,  and  to  limit  the  amount  of  training  data  required  to  perform  the  classification. 


2.3.1  Generalized  likelihood  ratio  test 


In  our  setting,  we  have  P  possible  classes  and  we  define  the  hypothesis  Hi  to  be  that  the  observed  image  x  G  RN 
belongs  to  class  Ci  for  i  =  1, . . . ,  P.  For  each  class  G,  an  element  x  G  Ci  can  be  parameterized  by  a  unique  K- 
dimensional  parameter  vector  0*  G  ©i,  i.e.  x  =  /i(@i)  for  some  /^;  an  example  parameter  is  the  pose  of  the  object  in 
the  scene  (translation,  rotation,  etc.).  If  the  mapping  fi  is  well-behaved,  the  collection  of  signals  {fi(Si)  :  0*  G  0J 
forms  a  iL-dimensional  manifold  embedded  in  the  ambient  signal  space. 

We  will  first  assume  that  noisy  measurements  of  x  are  taken,  y  =  x  +  u?,  giving  us  a  distribution  p(y\&i,Hi ) 
for  the  measured  signal  y  under  hypothesis  Hi  and  parameters  0*.  The  GLRT  classifier  is 


where 


C{ y)  =  argmax  p(y\&i,Hi), 

i=l,...,P 

©i  =  argmax  p(y\@,Hi) 
6eet 


(2) 

(3) 


denotes  the  maximum  likelihood  estimate  (MLE)  of  the  parameters  @i  under  hypothesis  H,.  Under  an  additive 
white  Gaussian  noise  (AWGN)  model  for  u >,  the  likelihood  for  each  hypothesis  Hi  becomes 

p(y\@i,Hi)  oc  - - \  ,  (4) 

I|y-/i(©i)ll2 

meaning  that  after  estimates  for  the  parameters  are  obtained  for  each  class,  the  GLRT  reduces  to  nearest-neighbor 
classification  among  the  available  hypotheses. 


2.3.2  Manifold  parameter  estimation 

In  order  to  implement  the  GLRT  as  described  above,  we  first  need  to  obtain  estimates  of  the  parameter  vectors 
0*  from  the  noisy  measurements  y  under  each  of  the  hypotheses.  A  natural  approach  to  this  problem  is  through 
nonlinear  least-squares,  in  which  we  seek  the  value  of  0^  that  minimizes  the  objective  function 

D(&i)  =  \\y-fi(&i)\\l  (5) 

For  differentiable  D(0),  we  can  use  Newton’s  method  to  obtain  iterative  estimates  of  the  parameters  as 

©?: = ©r1  -  [Hter1)]^©?-1)  (6) 

for  the  nth  iteration,  with  J(0)  =  VT>(0)  (the  gradient)  and  H(0)  the  Hessian  matrix  of  D ;  with  a  good  initial 
choice  the  algorithm  converges  to  the  correct  estimate.  Note  that  the  classical  matched  filter  is  an  elegant  method 
for  minimizing  (5)  on  the  manifold  consisting  of  all  possible  shifts  of  a  signal.  In  (6)  we  extend  this  approach  to 
arbitrary  differentiable  manifolds.  In  essence,  (6)  provides  a  way  of  generalizing  the  classical  matched  filter  to  a 
richer  class  of  manifolds,  while  reducing  the  number  of  samples  from  the  manifold  required  during  the  estimation 
process. 

However,  in  extending  this  to  our  compressive  classification  setting,  we  face  a  number  of  challenges.  First,  in 
general,  implementing  such  an  estimator  requires  complete  knowledge  of  the  function  fi  or  the  ability  to  evaluate 
fi(&)  for  all  possible  values  of  0.  In  some  practical  settings  this  may  not  be  possible,  but  this  is  easily  overcome 
since  a  dense  sampling  of  the  parameter  space  0^  and  a  nearest  neighbor  (NN)  estimation  rule  can  give  acceptable 
performance,  albeit  with  a  potentially  high  computational  cost.  Potentially  more  challenging  is  that:  (i)  our 
manifolds  may  not  be  differentiable,  in  which  case  we  cannot  directly  apply  (6)  [15],  and  (ii)  it  may  be  possible 
that  random  projections  of  our  data  could  alter  the  manifold  structure  of  our  signals.  Fortunately,  we  can  overcome 
both  of  these  challenges  through  the  use  of  multiscale  measurements. 
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2.3.3  Multiscale  measurements  for  image  appearance  manifolds 

In  the  case  of  interest — target  classification — the  classes  C{  are  IAMs  that  each  correspond  to  different  classes  of 
targets.  The  parameter  vector  denotes  the  articulation  parameters  for  the  target,  such  as  rotation,  translation, 
angle  of  view,  etc.  The  resulting  parametric  manifolds  are  nonlinear — since  linear  combinations  of  manifold  elements 
are  in  general  not  contained  in  the  manifold — and  non- differentiable — due  to  prevalent  changes  in  hard  edges  in  the 
image  view  caused  by  rotations  in  and  out  of  view,  occlusions,  etc.  Previous  research  [15]  has  identified  a  multiscale 
structure  to  such  manifolds  that  can  be  exploited  through  regularization  to  allow  differentiability  at  several  scales. 
This  is  achieved  through  the  use  of  a  nested  set  of  regularization  kernels  Gi,G2,  . . .  for  each  iteration,  with  the 
kernels  becoming  increasingly  sharper.  Thus,  instead  of  applying  Newton’s  method  to  /(©$)  directly,  we  use  an 
objective  function  for  the  regularized  images;  for  the  nth  iteration,  the  objective  function  becomes 

£>n(©i)  =  ||Gny-Gn/i(0i)||l,  (7) 

which  uses  the  corresponding  regularization  kernel. 

2.3.4  Compressive  measurements  for  smooth  manifolds 

It  has  also  been  shown  that  most  of  the  structure  of  a  smooth  signal  manifold  is  preserved  under  a  random  lower 
dimensional  projection  [16].  More  specifically,  for  a  IT-dimensional  manifold  embedded  in  N- dimensional  space, 
with  high  probability  a  random  M-dimensional  projection  is  invertible — and  thus  preserves  the  manifold  structure — 
provided  that  M  >  CKlog(N)  for  some  constant  G  that  depends  on  the  smoothness  of  the  manifold.  Thus,  instead 
of  performing  parameter  estimation  based  on  a  direct  measurement  of  the  signal  y  =  x  +  u?,  we  can  choose  to 
observe  only  a  lower  dimensional,  randomly  projected  version  y  =  <f>x  +  cj,  where  <f>  is  an  M  x  TV  measurement 
matrix  with  independent,  randomly  distributed  entries.  Accordingly,  we  update  the  objective  function  to 

Dc(@i)  =  ||y  -  $/i(©i)lli-  (8) 

When  Gaussian  random  measurements  are  used,  this  is  equivalent  to  employing  different  colored  Gaussian  random 
measurements  at  each  iteration;  see  [17]  for  more  details.  Moreover,  the  dimensionality  reduction  affords  savings 
in  computational  complexity  and  storage  requirements  of  the  estimation  and  classification  algorithms  described 
earlier. 

2.3.5  The  multiscale  smashed  filter 

We  are  now  in  a  position  to  describe  how  we  will  overcome  the  challenges  listed  at  the  end  of  Section  2.3.2.  In  [18] 
we  introduced  the  smashed  filter  as  a  method  for  classification  that  uses  compressive  measurements  for  classification 
where  each  class  is  represented  by  a  low-dimensional  manifold.  This  is  inspired  by  the  fact  that  random  projections 
do  not  disturb  the  structure  of  smooth  manifolds,  as  described  above.  However,  as  we  have  just  observed,  in  our 
setting  the  manifolds  might  not  be  smooth.  To  address  this  problem,  we  exploit  the  multiscale  structure  of  IAMs 
and  combine  the  use  of  multiscale  measurements  with  random  projections.  Thus  we  smooth  the  IAMs  so  that  the 
projections  preserve  their  geometry.  This  uses  a  measurement  matrix  of  the  form 

"  $iGi 

<f>  =  1 

&sGs 

where  <f>n  is  an  Mn  x  N  matrix  with  randomly  distributed  entries  and  Gn  is  the  regularization  kernel  for  the  nth 
scale.  The  resulting  measurements  can  be  partitioned  into  measurements  for  each  of  the  regularized  versions,  i.e., 
yn  =  <f>nGnx  +  u;n,  which  are  used  on  sequential  iterations  of  Newton’s  method  by  employing  the  corresponding 
objective  functions 

DZ(&i)  =  \\yn-$nGnM&i)\\l  (9) 

This  classification  algorithm,  which  we  call  the  multiscale  smashed  filter ,  employs  the  compact  and  multiscale  nature 
of  the  manifolds  defined  by  the  signal  classes  to  estimate  the  signal  parameters  under  each  class  hypothesis,  together 
with  the  GLRT/NN  classification  rule  from  Section  2.3.1. 

2.3.6  Advantages  of  compressive  classification 

In  addition  to  the  computational  and  storage  savings  achieved  by  compressive  classification,  our  proposed  method 
shares  many  advantages  previously  shown  for  CS  reconstruction.  In  particular,  random  projections  allow  for 
universal  estimation  and  classification,  in  the  sense  that  random  projections  preserve  any  low-dimensional  structure 
of  a  signal  class  with  high  probability.  Additionally,  we  attain  progressivity  in  the  sense  that  a  larger  number  of 
projections  translate  into  higher  classification  rates  due  to  increased  noise  tolerance. 


6 


(a)  Tank  (b)  School  Bus  (c)  Truck 


Figure  3:  Models  used  for  classification  experiments. 


Figure  4:  Vehicle  classification  result  from  compressive  imaging  measurements  using  the  smashed  filter.  The  probability  of  classifi¬ 
cation  and  the  position  estimation  error  improve  as  the  number  of  measurements  increases. 

2.3.7  Experimental  performance 

We  performed  experiments  to  evaluate  the  multiscale  smashed  filter  in  a  target  classification  setting  using  synthet¬ 
ically  generated  binary  random  measurements.  In  these  experiments  we  define  three  classes,  each  for  a  different 
vehicle  model:  a  T-72  tank,  a  school  bus  and  a  truck.  The  unknown  parameter  in  each  vehicle  class  is  the  location 
of  the  vehicle  in  the  image,  which  can  vary  in  an  area  of  32  x  32  pixels.  The  models  used  are  shown  in  Figure  3.  For 
each  of  the  vehicles,  multiscale  measurements  were  taken  using  five  different  resolutions  -  from  8x8  and  128  x  128 
pixels  -  with  the  same  number  of  measurements  taken  at  each  of  them. 

We  tested  the  performance  of  the  multiscale  smashed  filter  classifier  under  different  levels  of  Gaussian  noise. 
The  measurements  for  each  position/class  combination  were  classified  using  a  multiscale  smashed  filter  trained  on 
all  other  available  data  points.  For  each  of  the  target  classes,  one  of  the  sampled  rotations  was  chosen  at  random 
as  an  initial  estimate.  The  gradient  of  the  manifold  was  estimated  using  consecutive  points  in  the  manifold 

sampling  for  each  parameter,  including  that  of  the  current  estimate.  We  then  executed  Newton’s  method  using 
measurements  at  different  resolutions  at  each  iteration,  proceeding  from  the  coarsest  to  the  finest  scale.  After 
the  position  was  estimated  under  each  hypothesis,  nearest  neighbor  classification  was  performed.  We  repeated  the 
experiment  10,000  times  for  each  testing  point,  with  randomly  selected  starting  points  each  time,  and  we  varied  the 
number  of  measurements  taken  from  5  to  60.  We  also  varied  the  power  of  the  noise  added  to  the  measurement  vector 
Results  are  shown  in  Figure  4,  and  show  that  due  to  the  low- dimensional  structure  of  the  underlying  I  AM,  very 
few  measurements  are  necessary  to  achieve  high  classification  rates.  Additionally,  the  performance  of  the  algorithm 
degrades  gracefully  as  the  power  of  the  noise  present  increases. 

2.4  Future  work 

The  results  of  both  of  our  thrusts  lead  to  several  areas  of  future  work.  Analog-to-information  theory  and  practice 
can  be  applied  to  any  sensing  scenario  in  which  the  volume  of  data  exceeds  traditional  analog-to-digital  conversion 
abilities,  or  makes  them  cost-prohibitive.  Therefore  continued  exploration  into  implementation,  reconstruction 
technique,  performance  analysis,  and  robustness  to  noise  would  be  fruitful.  With  the  compressive  classification 
thrust,  we  hope  to  develop  more  sophisticated  algorithms  to  exploit  the  manifold  structure  to  more  efficiently 
obtain  the  ML  estimates  required  by  the  smashed  filter.  For  example,  rather  than  an  exhaustive  nearest-neighbor 
search,  which  could  be  computationally  prohibitive  for  a  large  training  set,  a  greedy  approach  might  offer  similar 
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performance  at  significant  computational  savings;  other  approaches  that  exploit  the  smoothness  of  the  manifolds 
could  also  be  beneficial. 


3  Distributed  compressive  sensing 

3.1  Summary  of  results 

The  CS  framework  has  been  proposed  for  efficient  acquisition  of  sparse  and  compressible  signals  through  incoherent 
measurements.  In  past  work,  we  introduced  a  new  concept  of  joint  sparsity  of  a  signal  ensemble  and  used  them 
in  demonstrating  distributed  CS  schemes.  In  this  project  we  considered  joint  sparsity  via  graphical  models  that 
link  the  sparse  underlying  coefficient  vector,  signal  entries,  and  measurements.  Our  main  results  are  converse  and 
achievable  bounds  establishing  that  the  number  of  measurements  required  in  the  noiseless  measurement  setting  is 
closely  related  to  the  dimensionality  of  the  sparse  coefficient  vector.  Single  signal  and  joint  (single-encoder)  CS  are 
special  cases  of  joint  sparsity,  and  their  performance  limits  fit  into  our  graphical  model  framework  for  distributed 
(multi-encoder)  CS. 

3.2  Review  of  joint  sparsity  models 

In  this  section,  we  generalize  the  notion  of  a  signal  being  sparse  in  some  basis  to  joint  sparsity  within  a  signal 
ensemble.  We  begin  with  basic  notation.  Let  A  :=  {1,2,...,*/}  be  the  set  of  signal  indices.  Denote  the  signals  in 
the  ensemble  by  Xj  E  M^,  where  j  E  A.  We  use  Xj{n )  to  denote  sample  n  in  signal  j,  and  assume  for  the  sake 
of  illustration  that  these  signals  are  sparse  in  the  canonical  basis,  i.e.,  T  =  I.  The  entries  of  the  signal  can  take 
arbitrary  real  values,  and  the  framework  is  extendable  to  arbitrary  T. 

We  denote  by  the  measurement  matrix  for  signal  /;  is  Mj  x  N  and,  in  general,  entries  of  are  different 
for  each  j.  Thus,  yj  =  &jXj  consists  of  Mj  <  N  random  measurements  of  Xj.  We  emphasize  random  Gaussian 
matrices  in  the  following,  but  other  measurement  matrices  are  possible.  To  compactly  represent  the  signal 
and  measurement  ensembles,  we  define  X  =  [xj  . . .  Xj]T  E  MJAr  and  Y  =  [yj  . . .  y^]T  E  Mj .  Finally,  we  also 
define  <f>  =  diag(<Fi, . . . ,  <Fj),  where  diag  denotes  a  matrix  diagonal  concatenation,  to  get  Y  =  <FX. 

3.2.1  Algebraic  framework 

Our  framework  enables  analysis  of  a  given  ensemble  X\,X2 , . . .  ,xj  in  a  “jointly  sparse”  sense,  as  well  as  a  metric 
for  the  complexities  of  different  signal  ensembles.  It  is  based  on  a  factored  representation  of  the  signal  ensemble, 
and  decouples  location  and  value  information.  We  begin  by  illustrating  the  single  signal  case. 

Single  signal  case:  Consider  a  sparse  x  E  with  K  <  N  nonzero  entries.  Alternatively,  we  can  write 
x  =  P0,  where  0  E  RK  contains  the  nonzero  values  of  x,  and  P  is  an  identity  submatrix ,  i.e.,  P  contains  K  columns 
of  the  N  x  N  identity  matrix  I.  To  model  the  set  of  all  possible  sparse  signals,  let  V  be  the  set  of  all  identity 
submatrices  of  all  possible  sizes  N  x  K' ,  with  1  <  K'  <  N.  We  refer  to  V  as  a  sparsity  model.  Given  a  signal  x, 
one  may  consider  all  possible  factorizations  x  =  PO ,  with  P  E  V.  Among  them,  the  smallest  dimensionality  for  6 
indicates  the  sparsity  of  x  under  the  model  V. 

Multiple  signal  case:  For  multiple  signals,  consider  factorizations  of  the  form  X  =  P0  where  X  E  MJiV  as 
above,  P  E  RJNxD ,  and  0  E  MD.  We  refer  to  P  and  0  as  the  location  matrix  and  value  vector ,  respectively.  A 
joint  sparsity  model  (JSM)  is  defined  in  terms  of  a  set  V  of  admissible  location  matrices  P  with  varying  numbers 
of  columns.  Unlike  the  single  signal  case,  there  are  multiple  choices  for  what  matrices  P  belong  to  a  joint  sparsity 
model  P. 

Minimal  sparsity:  For  a  given  ensemble  X,  let  PF(X)  denote  the  set  of  feasible  location  matrices  P  E  V  for 
which  a  factorization  X  =  P0  exists.  Among  the  feasible  location  matrices,  we  let  Tm(X)  C  Tf(X)  denote  the 
matrices  P  having  the  minimal  number  of  columns.  The  number  of  columns  D  for  each  P  E  Vm(X)  is  called  the 
joint  sparsity  level  of  X  under  the  model  P.  Generally  speaking,  the  minimal  location  matrices  Vm{X)  permit  the 
most  efficient  factorizations  of  the  signal  ensemble;  we  show  in  Section  3.3  that  these  matrices  dictate  the  number 
of  measurements. 

We  restrict  our  attention  in  this  paper  to  scenarios  where  each  signal  Xj  is  generated  as  a  combination  of  two 
components:  ( i )  a  common  component  zc,  which  is  present  in  all  signals,  and  (ii)  an  innovation  component  Zj,  which 
is  unique  to  each  signal.  These  combine  additively,  giving  Xj  =  zc  +  Zj,  j  E  A.  However,  individual  components 
might  be  zero- valued  in  specific  scenarios. 
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3.2.2  Example  joint  sparsity  model:  JSM-1 

In  the  sparse  common  and  innovations  (JSM-1)  model  [19],  the  common  component  zc  and  each  innovation  com¬ 
ponent  Zj  are  sparse  with  respective  sparsities  Kc  and  Kj.  Within  our  algebraic  framework,  the  class  of  JSM-1 
signals  correspond  to  the  set  of  all  matrices 


0 

\  ? 

pj  _ 

where  Pc  and  {Pj}je A  are  arbitrary  identity  submatrices  of  sizes  N  x  Kc  and  N  x  Kj,  respectively,  and  0  denotes  a 
zero  matrix  of  appropriate  size.  Given  X  =  P0,  we  can  partition  the  value  vector  0  =  [ 6 f  Of  0 f  •  •  •  #J]T,  where 
6c  G  MKc  and  each  Oj  G  RKj .  When  generating  a  signal  according  to  this  model,  we  have  zc  =  PcOc ,  Zj  =  PjOj, 
j  G  A.  If  P  G  Vm(X ),  then  the  joint  sparsity  is  D  =  Kc  +  Kj- 

Sparsity  reduction:  If  a  signal  ensemble  X  =  P0,  0  G  MD,  were  to  be  generated  by  a  selection  of  Pc  and 
{Pj}je a,  where  all  J  +  1  identity  submatrices  share  a  common  column  vector,  then  P  ^  Vm(X).  By  removing  the 
instance  of  this  column  in  Pc,  one  obtains  Q  G  V  such  that  there  exists  0'  G  MD_1  with  X  =  Q& .  We  term  this 
phenomenon  sparsity  reduction ,  since  it  reduces  the  effective  joint  sparsity  of  a  signal  ensemble. 

3.3  DCS  goal:  Bound  on  measurements  rates 

We  seek  conditions  on  the  number  of  measurements  from  each  sensor  that  guarantee  perfect  recovery  of  X  given  Y . 
Within  our  algebraic  framework,  recovering  X  involves  determining  a  value  vector  0  and  location  matrix  P  such 
that  X  =  P0.  Two  challenges  are  present.  First,  a  given  measurement  depends  only  on  some  of  the  components 
of  0,  and  the  measurement  budget  should  be  adjusted  between  the  sensors  in  order  to  gather  sufficient  information 
on  all  components  of  0.  Second,  the  decoder  must  identify  a  feasible  location  matrix  P  G  Vp(X)  from  the  set  V 
and  the  measurements  Y .  In  this  section,  we  develop  tools  to  address  these  challenges  and  characterize  the  number 
of  measurements  needed  by  them. 

3.3.1  Graphical  model  framework 

We  introduce  a  graphical  representation  that  captures  the  dependencies  between  the  measurements  in  Y  and  the 
value  vector  0,  represented  by  <F  and  P.  Consider  a  feasible  decomposition  of  X  into  P  G  Pf{X )  and  the 
corresponding  0.  We  define  the  following  sets  of  vertices,  illustrated  in  Figure  5(a):  (i)  the  set  of  value  vertices  Vy 
has  elements  with  indices  d  G  {1, . . .... ,  D}  representing  entries  of  the  value  vector  0(d) ;  (ii)  the  set  of  signal  vertices 

Vs  has  elements  with  indices  (j,n)  representing  the  signal  entries  Xj(n),  with  j  G  A  and  n  G  {1, . . . ,  TV};  and  (Hi) 
the  set  of  measurement  vertices  Vm  has  elements  with  indices  ( j,  m)  representing  the  measurements  yj  (m) ,  with 
j  G  A  and  m  G  {1, . . . ,  Mj}.  The  cardinalities  of  these  sets  are  \Vy  \  =  P,  \Vs\  =  JN  and  \Vm\  =  X^/e a  Mj- 

Let  P  be  partitioned  into  location  submatrices  PJ  ,  j  G  A,  so  that  Xj  =  PJ0;  here  PJ  is  the  restriction  of  P  to 
the  rows  that  generate  the  signal  Xj.  We  then  define  the  bipartite  graph  G  =  (Vs,  Vy,E),  determined  by  P,  where 
there  exists  an  edge  connecting  (j,  n)  and  d  if  and  only  if  PJ  (n,  d)  ^  0. 

A  similar  bipartite  graph  G'  =  (Vm,Vs,  E'),  illustrated  in  Figure  5(a),  connects  between  the  measurement 
vertices  {(j,m)}  and  the  signal  vertices  {(j,  n)};  there  exists  an  edge  in  G'  connecting  (j,n)  G  Vs  and  (j,m)  G  Vm 
if  $j(m,n)  7^  0.  When  the  measurements  matrices  <f> j  are  dense,  which  occurs  with  probability  one  for  i.i.d. 
Gaussian  random  matrices,  the  vertices  corresponding  to  entries  of  a  given  signal  Xj  in  Vs  are  all  connected  to  all 
vertices  corresponding  to  the  measurements  yj  in  Vy.  Figure  5  shows  an  example  for  dense  measurement  matrices: 
each  measurement  vertex  (j,  •)  is  connected  to  each  signal  vertex  (j,  •). 

The  graphs  G  and  G'  can  be  merged  into  G  =  ( Vm  ,  b/,  E)  that  relates  entries  of  the  value  vector  to  measure¬ 
ments.  Figure  5(b)  shows  the  example  composition  of  the  previous  two  bipartite  graphs.  G  is  used  to  recover  0 
from  the  measurement  ensemble  Y  when  P  is  known. 

3.3.2  Quantifying  dependencies  and  redundancies 

We  now  define  the  subset  of  the  value  vector  entries  that  is  measured  exclusively  by  a  subset  T  of  the  sensors  in 
the  ensemble;  the  cardinality  of  this  set  will  help  determine  the  number  of  measurements  the  sensors  in  T  should 
perform.  We  denote  by  E(V)  the  neighbors  of  a  set  of  vertices  V  through  the  edge  set  E. 
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Figure  5:  Bipartite  graphs  for  distributed  compressed  sensing,  (a)  G  =  ( Vs ,  VV,  E)  connects  the  entries  of  each  signal  with 
the  value  vector  coefficients  they  depend  on;  Gr  —  (Vm ,Vs,  E')  connects  the  measurements  at  each  sensor  with  observed 
signal  entries.  The  matrix  $  is  a  dense  Gaussian  random  matrix,  as  shown  in  the  graph,  (b)  G  =  (Vm,  VV,  E)  is  the 
composition  of  G  and  G' ,  and  relates  between  value  vector  coefficients  and  measurements,  (c)  Sets  of  exclusive  indices  for 
our  example. 


Definition  2  Let  G  =  (Vs,Vy,E)  be  the  bipartite  graph  determined  by  P,  let  V  C  A,  and  let  VV(T)  be  the  set  of 
vertices  Vs(r)  =  {(j,  n)  E  Vs  :  j  E  T,  n  E  {1, . . . ,  TV}}.  We  define  the  set  of  exclusive  indices  for  F  given  P,  denoted 
/(r,  P),  as  the  largest  subset  of  { 1, . . . ,  D}  such  that  E(I(F ,  P))  C  V^(T). 

/(T,  P)  is  significant  in  our  distributed  measurement  setting,  because  it  contains  the  coefficients  of  0  that  only 
affect  the  signals  in  the  set  F  and,  therefore,  can  only  be  measured  by  those  sensors.  Figure  5(c)  shows  an  example 
setting  of  two  signals  of  length  N  =  3  generated  by  a  matrix  P  from  the  JSM-1  model,  with  the  sets  /({ 1},  P)  and 
7({2},P)  defined  as  the  vertices  in  VV  that  connect  exclusively  with  VV({1})  and  W({2}),  respectively. 

Overlaps:  When  overlaps  between  common  and  innovation  components  are  present  in  a  signal,  we  cannot 
recover  the  overlapped  portions  of  both  components  from  the  measurements  of  this  signal  alone;  we  need  to  recover 
the  common  component’s  coefficients  using  measurements  of  other  signals  that  do  not  feature  the  same  overlap. 
Furthermore,  these  coefficients  of  the  value  vector  are  not  included  in  /(T,P).  We  thus  quantify  the  size  of  the 
overlap  for  all  subsets  of  signals  F  C  A  under  a  feasible  representation  given  by  P  and  0. 

Definition  3  The  overlap  size  for  the  set  of  signals  F  C  A,  denoted  Key,  is  the  number  of  indices  in  which  there 
is  overlap  between  the  common  and  the  innovation  component  supports  at  the  signals  j  ^  T;  more  formally , 

Kc,r(P)  =  \{n  E  {1,...,TV}  :  zc(n)  V  0,  Vj  £  F,zfin)  V  0}|. 

For  the  entire  set  of  signals,  the  overlap  size  Kq,k  =  0. 

For  T  V  A,  Kc,r(P)  provides  a  penalty  term  due  to  the  need  for  recovery  of  common  component  coefficients  that 
are  overlapped  by  innovations  in  all  other  signals  j  ^  F.  The  definition  of  Kc, a  accounts  for  the  fact  that  all  the 
coefficients  of  0  are  included  in  /(A,  P). 

3.3.3  Main  results 

Converse  and  achievable  bounds  on  the  number  of  measurements  necessary  for  recovery  are  given  below. 

Theorem  2  (Achievable,  known  P)  Assume  that  a  signal  ensemble  X  is  obtained  from  a  common/innovation 
component  JSM  V.  Let  a  be  a  measurement  tuple.  Suppose  there  exists  a  full  rank  location  matrix  P  E 

VF(X)  such  that 

^Mi>|J(r,P)|+irc,r(P)  (10) 

je  r 

for  all  F  C  A.  If  the  are  random  matrices  having  Mj  rows  of  i.i.d.  Gaussian  entries  for  each  j  E  A,  and  if 
Y  =  &X ,  then  with  probability  one  over  there  is  a  unique  solution  0  to  the  system  of  equations  Y  =  ^>P0,  and 
hence  the  signal  ensemble  X  can  be  uniquely  reconstructed  as  X  =  P0. 
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Theorem  3  (Achievable,  unknown  P)  Assume  that  a  signal  ensemble  X  is  obtained  from  a  common/innovation 
component  JSM  V ,  and  let  be  random  matrices  having  Mj  rows  of  i.i.d.  Gaussian  entries  for  each  j  E  A.  If 
there  exists  a  location  matrix  P*  ^Vp{X)  such  that 

^  —  U(r,  -P*)|  +  Kc,r(P*)  +  |r|  (n) 

je  r 

for  all  TC  A,  then  X  can  be  uniquely  recovered  from  Y  with  probability  one  over  4>. 


Theorem  4  (Converse)  Assume  that  a  signal  ensemble  X  is  obtained  from  a  common /innovation  component  JSM 
V .  Let  {Mj}je a  be  a  measurement  tuple.  Suppose  there  exists  a  full  rank  location  matrix  P  E  Vf(X)  such  that 

<\I(T,P)\  +  Kc,r(P)  (12) 

je  r 

for  some  T  C  A.  Let  4>j  be  any  set  of  measurement  matrices  having  Mj  rows  for  each  j  E  A,  and  let  Y  =  4>X. 
Then  there  exists  a  solution  0  such  that  Y  =  4>P@  but  X  :=  P@  /  X. 

The  number  of  measurements  needed  for  recovery  depends  on  the  number  of  value  vector  coefficients  that  are 
observed  only  by  the  sensors  in  T.  The  identification  of  a  feasible  location  matrix  P  causes  the  2  measurement- 
per-sensor  gap  between  the  converse  and  achievable  bounds  (11-12).  The  algorithm  used  in  Theorem  3  essentially 
performs  an  minimization  to  acquire  0,  where  the  correct  P  is  identified  using  an  additional  cross-validation 
step. 

Discussion:  The  theorems  can  also  be  applied  to  the  single  sensor  and  joint  measurement  settings.  In  the  single 
signal  setting,  we  will  have  x  =  PO  with  0  E  and  A  =  {1};  the  theorem  provides  the  requirement  M  >  K  +  1, 
which  matches  the  existing  requirements  for  reconstruction. 

The  joint  measurement  setting  is  equivalent  to  the  single  signal  setting  with  a  dense  measurement  matrix,  as 
all  measurements  are  dependent  on  all  signal  entries.  In  this  case,  however,  the  distribution  of  the  measurements 
among  the  available  sensors  is  irrelevant.  Therefore,  we  only  obtain  a  condition  on  the  total  number  of  measurements 
obtained  by  the  group  of  sensors  as  at}  >  D  +  1. 

3.4  Future  Work 

The  use  of  graphical  models  allowed  us  to  derive  bounds  on  the  number  of  measurements  necessary  to  recover  a 
signal  in  a  multi-sensor  compressive  sensing  setting.  It  follows  naturally  to  apply  the  insights  our  graphical  model 
framework  gives  us  to  a  variety  of  sensing  scenarios.  The  defined  recovery  technique  requires  an  minimization,  so 
investigation  of  a  convex  optimization  method  and  measurement  rates  associated  with  it  is  also  a  direction  worth 
pursuing.  Finally — and  our  graphical  model  approach  is  a  step  in  the  right  direction — we  still  hope  to  bring  together 
an  over-arching  theory  of  rate-distortion  analysis  to  distributed  compressive  sensing. 

4  Compressive  sensing  radar 

4.1  Summary  of  results 

We  took  the  principles  of  CS  and  applied  them  to  both  1-D  ranging  radar  and  2-D  imaging  radar.  We  demonstrated 
that  CS  has  the  potential  to  make  two  significant  improvements  to  radar  systems  by  (i)  eliminating  the  need  for  the 
pulse  compression  matched  filter  at  the  receiver,  and  (ii)  reducing  the  required  receiver  analog-to-digital  conversion 
bandwidth  so  that  it  need  operate  only  at  the  radar  reflectivity’s  potentially  low  “information  rate”  rather  than  at 
its  potentially  high  Nyquist  rate.  These  ideas  could  enable  the  design  of  new,  simplified  radar  systems,  shifting  the 
emphasis  from  expensive  receiver  hardware  to  smart  signal  recovery  algorithms.  We  formalized  our  approach  and 
used  it  to  accurately  recover  a  1-D  ranging  problem  using  only  50  percent  of  the  measurements  the  Nyquist  rate 
alone  would  dictate,  and  recover  a  2-D  SAR  test  image  using  only  25  percent  of  the  measurements  that  would  have 
been  needed. 
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Figure  6:  Prototypical  radar  transmitter. 


Figure  7:  Prototypical  digital  radar  receivers  for  the  transmitter  in  Fig.  6  perform  matched  filtering  either  in  the  (a)  analog 
or  (b)  digital  domain. 

4.2  CS-based  radar 

In  order  to  illustrate  our  CS-based  radar  concept,  consider  a  simplified  ID  range  imaging  model  of  a  target  described 
by  u{r )  with  range  variable  r.  If  we  let  the  transmitted  radar  pulse  sp(t)  interact  with  the  target  by  means  of  a 
linear  convolution  [20],  then  the  received  radar  signal  «§#(£)  is  given  by 

sR(t)  =  A  J  sT(t  —  t)  u{t )  dr ,  (13) 

where  we  have  converted  the  range  variable  r  to  time  t  using  t  =  with  c  the  propagation  velocity  of  light, 
and  where  A  represents  attenuation  due  to  propagation  and  reflection.  If  the  transmitted  signal  has  the  property 
that  sp{t)  *  sr(—t)  ~  S(t)  (which  is  true  for  PN  and  chirp  signals),  then  a  band-limited  measurement  of  the  radar 
reflectivity  u(t )  can  be  obtained  by  pulse  compression,  that  is,  by  correlating  s#(£)  with  sp(t)  in  a  matched  filter 
(recall  Fig.  7)  [20].  A/D  conversion  occurs  either  before  or  after  the  matched  filtering,  resulting  in  N  Nyquist-rate 
samples. 

Our  CS-based  radar  approach  is  based  on  two  key  observations.  First,  the  target  reflectivity  functions  u(t)  that 
we  wish  to  obtain  through  the  radar  process  are  often  sparse  or  compressible  in  some  basis.  For  example,  a  set 
of  K  point  targets  corresponds  to  a  sparse  sum  of  delta  functions  as  in  u(t )  =  ai$(t  ~  k i );  smooth  targets 

are  sparse  in  the  Fourier  or  wavelet  domain;  and  range-Doppler  reflectivities  are  often  sparse  in  the  joint  time- 
frequency  (or  ambiguity)  domain  [20].  Such  target  reflectivity  functions  u(t )  are  good  candidates  for  acquisition 
via  CS  techniques. 

Second,  time-translated  and  frequency-modulated  versions  of  the  PN  or  chirp  signals  transmitted  as  radar 
waveforms  spit)  form  a  dictionary  (the  extension  of  a  basis  or  frame)  that  is  incoherent  with  the  time,  frequency, 
and  time-frequency  bases  that  sparsify  or  compress  the  above  mentioned  classes  of  target  reflectivity  functions 
u(t )  [21].  This  means  that  PN  or  chirp  signals  are  good  candidates  for  the  rows  of  a  CS  acquisition  matrix  $  as  a 
“random  filter”  where: 

N 

y(m )  =  E  p(Dm  —  n)x(n)  (14) 

n= 1 

for  m  =  1, . . . ,  M. 

By  combining  these  observations  we  can  both  eliminate  the  matched  filter  in  the  radar  receiver  and  lower  the 
receiver  A/D  converter  bandwidth  using  CS  principles.  Consider  a  new  design  for  a  radar  system  that  consists  of 
the  following  components.  The  transmitter  is  the  same  as  in  a  classical  radar;  the  transmit  antenna  emits  a  PN 
or  chirp  signal  sp(t)  (recall  Fig.  6).  However,  the  receiver  does  not  consist  of  a  matched  filter  and  high-rate  A/D 
converter  but  rather  only  a  low-rate  A/D  converter  that  operates  not  at  the  Nyquist  rate  but  at  a  rate  proportional 
to  the  target  reflectivity’s  compressibility  (see  Fig.  8). 

We  make  the  connection  explicit  for  a  PN-based  CS  radar  with  a  simple  sampling  model.  Consider  a  target 
reflectivity  generated  from  N  Nyquist-rate  samples  x(n)  via  u(t)  =  x(\t/ A]),  n  =  1, ...  ,1V,  on  the  time  interval 
of  interest  0  <  t  <  N A.  The  radar  transmits  a  PN  signal  generated  from  a  length- N  random  Bernoulli  ±1  vector 


12 


SR(t)  from  receive  antenna 


low- rate 
A/D 


processing 


Figure  8:  Compressive  radar  receiver  for  the  transmitter  in  Fig.  6  performs  neither  matched  filtering  nor  high-rate  analog- 
to-digital  conversion. 
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(b) 


3.5 
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*  recovered  signal 


2.5 


Figure  9:  CS  radar  example,  (a)  Transmitted  PN  pulse  sr(t),  (b)  low-rate  measurement  y,  and  (c)  true  and  recovered 
reflectivity  profiles  u{t). 


p(n)  via  sr{t)  =  p(\t/ A]).  The  received  radar  signal  sr(£)  is  given  by  (13);  we  sample  it  not  every  A  seconds  but 
rather  every  DA  seconds,  where  D  =  \_N/M\  and  M  <  TV,  to  obtain  the  M  samples,  m  =  1, . . . ,  M, 


y(m) 


SR^t) \t=mDA 


nNA 

A  /  srirnDA  —  r)  u{r)  dr 

J  o 

-W  />nA 

A  y^  p(mD  —  n) 

n=  1  Jin- 1; 


u(r)  dr 


N 

A  ^  p{mD  —  n )  x(n), 

n=  1 


(15) 


which  are  precisely  a  scaled  version  of  (14).  In  words,  a  PN  sequence  radar  implements  a  random  filter  in  the  sense 
of  [21],  and  hence  the  low-rate  samples  y  contain  sufficient  information  to  reconstruct  the  signal  x  corresponding  to 
the  Nyquist-rate  samples  of  the  reflectivity  u(t )  via  linear  programming  or  a  greedy  algorithm.  Chirp  pulses  yield 
similar  results. 

Figure  9  illustrates  the  scheme  in  action.  A  radar  reflectivity  profile  is  probed  with  a  PN  pulse  sequence,  measured 
at  one-half  the  Nyquist  sampling  rate,  and  subsequently  recovered  exactly  using  an  OMP  greedy  algorithm  and  a 
sparsity  frame  T  combining  delta  spikes  and  Haar  wavelets. 

Additional  gains  can  be  expected  for  2D  CS  radar  imaging.  We  illustrate  this  with  a  simple  simulation  of  SAR 
data  acquisition  and  imaging.  Figure  10(a)  shows  the  reflectivity  function  that  is  to  be  recovered  from  the  SAR  data. 
We  simulated  a  SAR  data  acquisition  using  the  method  described  in  [22].  Figure  10(b)  shows  the  result  of  a  2D 
CS  implementation  with  four  times  undersampling,  which  gives  an  exact  recovery  of  the  reflectivity  function.  The 
traditional  SAR  image  (Fig.  10(c))  shows  artifacts  of  the  limited  aperture  of  the  imaging  operator,  which  are  absent 
in  the  CS  image.  The  result  is  similar  to  what  is  obtained  with  the  feature-enhanced  imaging  approach  of  [23]. 
However,  the  CS-based  approach  has  some  advantages,  such  as  an  almost  infinite  number  of  sparse  representations 
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Figure  10:  CS  synthetic  aperture  radar  (SAR)  example,  (a)  2D  reflectivity,  (b)  CS  SAR  image,  and  (c)  traditional  SAR 
image. 


to  choose  from  as  well  as  more  efficient  signal  recovery  algorithms  [24]. 

4.3  Future  work 

The  initial  success  of  our  approach  leads  us  to  believe  that  CS  principles  can  be  applied  beyond  the  area  of  complete 
signal  recovery.  We  showed  that  the  information  scalability  of  CS  allows  for  a  much  wider  range  of  statistical 
inference  tasks.  Detection,  classification,  and  recognition  would  all  be  useful  applications  for  Radar.  The  fact  they 
require  even  fewer  measurements  than  for  complete  reconstruction  is  another  benefit. 

Though  we  showed  the  benefits  of  CS-based  radar,  there  are  a  number  of  challenges  to  be  overcome  before  an 
actual  CS-based  radar  system  will  become  a  reality.  First,  the  target  reflectivity  being  probed  must  be  compressible 
in  some  basis,  frame,  or  dictionary.  Second,  the  signal  recovery  algorithms  must  be  able  to  handle  real-world  radar 
acquisition  scenarios  with  sufficient  computational  efficiency  and  robust  performance  for  noisy  data.  Third,  there  is 
a  subtle  tradeoff  to  optimize  between  the  reduction  in  sampling  rate  [N/M J  and  the  dynamic  range  of  the  resulting 
CS  system  [8].  These  are  areas  of  active  research  for  both  our  team  and  the  broader  CS  community.  In  particular, 
there  could  be  links  with  recent  work  on  finite  rate  of  innovation  sampling  for  ultrawideband  communication 
systems  [25]. 
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